Overview

Dataset statistics

Number of variables30
Number of observations3772
Missing cells6064
Missing cells (%)5.4%
Duplicate rows33
Duplicate rows (%)0.9%
Total size in memory884.2 KiB
Average record size in memory240.0 B

Variable types

Numeric6
Categorical3
Boolean20
Unsupported1

Alerts

TBG_measured has constant value "False"Constant
Dataset has 33 (0.9%) duplicate rowsDuplicates
TSH is highly overall correlated with ClassHigh correlation
T3 is highly overall correlated with TT4 and 2 other fieldsHigh correlation
TT4 is highly overall correlated with T3 and 2 other fieldsHigh correlation
T4U is highly overall correlated with pregnant and 1 other fieldsHigh correlation
FTI is highly overall correlated with T3 and 2 other fieldsHigh correlation
pregnant is highly overall correlated with T4UHigh correlation
psych is highly overall correlated with referral_sourceHigh correlation
TSH_measured is highly overall correlated with T3_measured and 3 other fieldsHigh correlation
T3_measured is highly overall correlated with TSH_measured and 1 other fieldsHigh correlation
TT4_measured is highly overall correlated with TSH_measured and 3 other fieldsHigh correlation
T4U_measured is highly overall correlated with TSH_measured and 2 other fieldsHigh correlation
FTI_measured is highly overall correlated with TSH_measured and 2 other fieldsHigh correlation
referral_source is highly overall correlated with psychHigh correlation
Class is highly overall correlated with TSH and 2 other fieldsHigh correlation
sex has 150 (4.0%) missing valuesMissing
TSH has 369 (9.8%) missing valuesMissing
T3 has 769 (20.4%) missing valuesMissing
TT4 has 231 (6.1%) missing valuesMissing
T4U has 387 (10.3%) missing valuesMissing
FTI has 385 (10.2%) missing valuesMissing
TBG has 3772 (100.0%) missing valuesMissing
TBG is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-03-07 05:15:28.488424
Analysis finished2023-03-07 05:16:37.989797
Duration1 minute and 9.5 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

age
Real number (ℝ)

Distinct93
Distinct (%)2.5%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean51.735879
Minimum1
Maximum455
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:38.088506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q136
median54
Q367
95-th percentile79
Maximum455
Range454
Interquartile range (IQR)31

Descriptive statistics

Standard deviation20.084958
Coefficient of variation (CV)0.38822107
Kurtosis41.86283
Mean51.735879
Median Absolute Deviation (MAD)15
Skewness1.9558145
Sum195096
Variance403.40555
MonotonicityNot monotonic
2023-03-07T11:16:38.212209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
59 95
 
2.5%
60 91
 
2.4%
70 90
 
2.4%
73 81
 
2.1%
55 81
 
2.1%
63 78
 
2.1%
72 77
 
2.0%
58 77
 
2.0%
62 75
 
2.0%
61 74
 
2.0%
Other values (83) 2952
78.3%
ValueCountFrequency (%)
1 6
0.2%
2 4
0.1%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 1
 
< 0.1%
7 5
0.1%
8 3
0.1%
10 1
 
< 0.1%
11 4
0.1%
12 4
0.1%
ValueCountFrequency (%)
455 1
 
< 0.1%
94 2
 
0.1%
93 2
 
0.1%
92 2
 
0.1%
91 2
 
0.1%
90 5
0.1%
89 8
0.2%
88 9
0.2%
87 12
0.3%
86 6
0.2%

sex
Categorical

Distinct2
Distinct (%)0.1%
Missing150
Missing (%)4.0%
Memory size29.6 KiB
F
2480 
M
1142 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3622
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
F 2480
65.7%
M 1142
30.3%
(Missing) 150
 
4.0%

Length

2023-03-07T11:16:38.322906image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-07T11:16:38.426614image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
f 2480
68.5%
m 1142
31.5%

Most occurring characters

ValueCountFrequency (%)
F 2480
68.5%
M 1142
31.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3622
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 2480
68.5%
M 1142
31.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 3622
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 2480
68.5%
M 1142
31.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3622
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 2480
68.5%
M 1142
31.5%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3308 
True
464 
ValueCountFrequency (%)
False 3308
87.7%
True 464
 
12.3%
2023-03-07T11:16:38.515891image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3722 
True
 
50
ValueCountFrequency (%)
False 3722
98.7%
True 50
 
1.3%
2023-03-07T11:16:38.602659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3729 
True
 
43
ValueCountFrequency (%)
False 3729
98.9%
True 43
 
1.1%
2023-03-07T11:16:38.686435image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

sick
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3625 
True
 
147
ValueCountFrequency (%)
False 3625
96.1%
True 147
 
3.9%
2023-03-07T11:16:38.770210image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

pregnant
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3719 
True
 
53
ValueCountFrequency (%)
False 3719
98.6%
True 53
 
1.4%
2023-03-07T11:16:38.854984image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3719 
True
 
53
ValueCountFrequency (%)
False 3719
98.6%
True 53
 
1.4%
2023-03-07T11:16:38.938759image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3713 
True
 
59
ValueCountFrequency (%)
False 3713
98.4%
True 59
 
1.6%
2023-03-07T11:16:39.023533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3538 
True
 
234
ValueCountFrequency (%)
False 3538
93.8%
True 234
 
6.2%
2023-03-07T11:16:39.108306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3535 
True
 
237
ValueCountFrequency (%)
False 3535
93.7%
True 237
 
6.3%
2023-03-07T11:16:39.192082image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

lithium
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3754 
True
 
18
ValueCountFrequency (%)
False 3754
99.5%
True 18
 
0.5%
2023-03-07T11:16:39.276855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

goitre
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3738 
True
 
34
ValueCountFrequency (%)
False 3738
99.1%
True 34
 
0.9%
2023-03-07T11:16:39.360693image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

tumor
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3676 
True
 
96
ValueCountFrequency (%)
False 3676
97.5%
True 96
 
2.5%
2023-03-07T11:16:39.444472image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3771 
True
 
1
ValueCountFrequency (%)
False 3771
> 99.9%
True 1
 
< 0.1%
2023-03-07T11:16:39.528245image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

psych
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3588 
True
 
184
ValueCountFrequency (%)
False 3588
95.1%
True 184
 
4.9%
2023-03-07T11:16:39.612020image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
True
3403 
False
369 
ValueCountFrequency (%)
True 3403
90.2%
False 369
 
9.8%
2023-03-07T11:16:39.696793image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

TSH
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct287
Distinct (%)8.4%
Missing369
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean5.0867661
Minimum0.005
Maximum530
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:39.794534image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.005
5-th percentile0.025
Q10.5
median1.4
Q32.7
95-th percentile13
Maximum530
Range529.995
Interquartile range (IQR)2.2

Descriptive statistics

Standard deviation24.52147
Coefficient of variation (CV)4.8206405
Kurtosis238.18146
Mean5.0867661
Median Absolute Deviation (MAD)1.04
Skewness13.882653
Sum17310.265
Variance601.30251
MonotonicityNot monotonic
2023-03-07T11:16:39.921307image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2 116
 
3.1%
1.3 105
 
2.8%
1.1 97
 
2.6%
1.4 91
 
2.4%
1.5 80
 
2.1%
1.9 79
 
2.1%
1.2 79
 
2.1%
1.6 78
 
2.1%
1.7 73
 
1.9%
2.3 70
 
1.9%
Other values (277) 2535
67.2%
(Missing) 369
 
9.8%
ValueCountFrequency (%)
0.005 52
1.4%
0.01 24
0.6%
0.015 26
0.7%
0.02 55
1.5%
0.025 17
 
0.5%
0.03 25
0.7%
0.035 19
 
0.5%
0.04 17
 
0.5%
0.045 13
 
0.3%
0.05 50
1.3%
ValueCountFrequency (%)
530 1
< 0.1%
478 1
< 0.1%
472 1
< 0.1%
468 1
< 0.1%
440 1
< 0.1%
400 1
< 0.1%
236 1
< 0.1%
230 1
< 0.1%
199 1
< 0.1%
188 1
< 0.1%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
True
3003 
False
769 
ValueCountFrequency (%)
True 3003
79.6%
False 769
 
20.4%
2023-03-07T11:16:40.035004image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

T3
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct69
Distinct (%)2.3%
Missing769
Missing (%)20.4%
Infinite0
Infinite (%)0.0%
Mean2.0134998
Minimum0.05
Maximum10.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:40.136700image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.05
5-th percentile0.8
Q11.6
median2
Q32.4
95-th percentile3.4
Maximum10.6
Range10.55
Interquartile range (IQR)0.8

Descriptive statistics

Standard deviation0.82743419
Coefficient of variation (CV)0.41094326
Kurtosis9.8680496
Mean2.0134998
Median Absolute Deviation (MAD)0.4
Skewness1.730874
Sum6046.54
Variance0.68464734
MonotonicityNot monotonic
2023-03-07T11:16:40.264358image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 238
 
6.3%
1.8 207
 
5.5%
2.2 201
 
5.3%
1.9 189
 
5.0%
2.1 184
 
4.9%
2.3 183
 
4.9%
1.6 159
 
4.2%
1.7 157
 
4.2%
1.5 141
 
3.7%
2.4 137
 
3.6%
Other values (59) 1207
32.0%
(Missing) 769
20.4%
ValueCountFrequency (%)
0.05 2
 
0.1%
0.1 2
 
0.1%
0.2 18
0.5%
0.3 22
0.6%
0.4 20
0.5%
0.5 16
 
0.4%
0.6 20
0.5%
0.7 32
0.8%
0.8 40
1.1%
0.9 42
1.1%
ValueCountFrequency (%)
10.6 1
< 0.1%
8.5 1
< 0.1%
7.6 1
< 0.1%
7.3 1
< 0.1%
7.1 2
0.1%
7 1
< 0.1%
6.7 1
< 0.1%
6.6 1
< 0.1%
6.2 1
< 0.1%
6.1 1
< 0.1%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
True
3541 
False
 
231
ValueCountFrequency (%)
True 3541
93.9%
False 231
 
6.1%
2023-03-07T11:16:40.381083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

TT4
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct241
Distinct (%)6.8%
Missing231
Missing (%)6.1%
Infinite0
Infinite (%)0.0%
Mean108.31934
Minimum2
Maximum430
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:40.479822image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile63
Q188
median103
Q3124
95-th percentile170
Maximum430
Range428
Interquartile range (IQR)36

Descriptive statistics

Standard deviation35.604248
Coefficient of variation (CV)0.32869704
Kurtosis6.6184389
Mean108.31934
Median Absolute Deviation (MAD)18
Skewness1.267704
Sum383558.8
Variance1267.6624
MonotonicityNot monotonic
2023-03-07T11:16:40.599501image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
101 71
 
1.9%
93 67
 
1.8%
103 63
 
1.7%
98 63
 
1.7%
102 59
 
1.6%
87 59
 
1.6%
94 56
 
1.5%
91 56
 
1.5%
99 55
 
1.5%
120 54
 
1.4%
Other values (231) 2938
77.9%
(Missing) 231
 
6.1%
ValueCountFrequency (%)
2 1
 
< 0.1%
2.9 1
 
< 0.1%
3 2
 
0.1%
4 1
 
< 0.1%
4.8 1
 
< 0.1%
5.8 2
 
0.1%
6 1
 
< 0.1%
9.5 1
 
< 0.1%
10 5
0.1%
11 2
 
0.1%
ValueCountFrequency (%)
430 2
0.1%
372 1
< 0.1%
301 1
< 0.1%
289 1
< 0.1%
273 1
< 0.1%
272 1
< 0.1%
263 1
< 0.1%
261 1
< 0.1%
258 1
< 0.1%
257 1
< 0.1%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
True
3385 
False
387 
ValueCountFrequency (%)
True 3385
89.7%
False 387
 
10.3%
2023-03-07T11:16:40.711203image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

T4U
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct146
Distinct (%)4.3%
Missing387
Missing (%)10.3%
Infinite0
Infinite (%)0.0%
Mean0.9949997
Minimum0.25
Maximum2.32
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:40.810949image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.25
5-th percentile0.74
Q10.88
median0.98
Q31.08
95-th percentile1.34
Maximum2.32
Range2.07
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.19545728
Coefficient of variation (CV)0.19643953
Kurtosis4.0734715
Mean0.9949997
Median Absolute Deviation (MAD)0.1
Skewness1.2326742
Sum3368.074
Variance0.038203546
MonotonicityNot monotonic
2023-03-07T11:16:40.931504image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.99 95
 
2.5%
0.9 93
 
2.5%
1.01 91
 
2.4%
1 90
 
2.4%
0.92 89
 
2.4%
0.93 88
 
2.3%
0.97 88
 
2.3%
1.02 87
 
2.3%
0.91 85
 
2.3%
0.95 83
 
2.2%
Other values (136) 2496
66.2%
(Missing) 387
 
10.3%
ValueCountFrequency (%)
0.25 1
 
< 0.1%
0.31 1
 
< 0.1%
0.36 1
 
< 0.1%
0.38 1
 
< 0.1%
0.41 1
 
< 0.1%
0.46 1
 
< 0.1%
0.47 1
 
< 0.1%
0.48 2
0.1%
0.49 1
 
< 0.1%
0.5 3
0.1%
ValueCountFrequency (%)
2.32 1
< 0.1%
2.12 1
< 0.1%
2.03 1
< 0.1%
2.01 1
< 0.1%
1.97 1
< 0.1%
1.94 1
< 0.1%
1.93 1
< 0.1%
1.88 2
0.1%
1.84 1
< 0.1%
1.83 2
0.1%
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
True
3387 
False
385 
ValueCountFrequency (%)
True 3387
89.8%
False 385
 
10.2%
2023-03-07T11:16:41.046224image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

FTI
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct234
Distinct (%)6.9%
Missing385
Missing (%)10.2%
Infinite0
Infinite (%)0.0%
Mean110.46965
Minimum2
Maximum395
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size29.6 KiB
2023-03-07T11:16:41.148953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile68
Q193
median107
Q3124
95-th percentile166
Maximum395
Range393
Interquartile range (IQR)31

Descriptive statistics

Standard deviation33.089698
Coefficient of variation (CV)0.29953655
Kurtosis7.874558
Mean110.46965
Median Absolute Deviation (MAD)15
Skewness1.3454318
Sum374160.7
Variance1094.9281
MonotonicityNot monotonic
2023-03-07T11:16:41.275610image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 73
 
1.9%
93 70
 
1.9%
114 65
 
1.7%
98 64
 
1.7%
107 64
 
1.7%
92 63
 
1.7%
104 63
 
1.7%
106 59
 
1.6%
101 59
 
1.6%
97 59
 
1.6%
Other values (224) 2748
72.9%
(Missing) 385
 
10.2%
ValueCountFrequency (%)
2 1
< 0.1%
2.8 1
< 0.1%
3 2
0.1%
4 1
< 0.1%
5.4 1
< 0.1%
7 1
< 0.1%
7.6 1
< 0.1%
8.4 1
< 0.1%
8.5 1
< 0.1%
8.9 1
< 0.1%
ValueCountFrequency (%)
395 2
0.1%
362 1
< 0.1%
349 1
< 0.1%
312 1
< 0.1%
291 1
< 0.1%
283 1
< 0.1%
281 1
< 0.1%
280 1
< 0.1%
274 1
< 0.1%
265 1
< 0.1%
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
False
3772 
ValueCountFrequency (%)
False 3772
100.0%
2023-03-07T11:16:41.392298image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

TBG
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing3772
Missing (%)100.0%
Memory size29.6 KiB

referral_source
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.6 KiB
other
2201 
SVI
1034 
SVHC
386 
STMW
 
112
SVHD
 
39

Length

Max length5
Median length5
Mean length4.3093849
Min length3

Characters and Unicode

Total characters16255
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSVHC
2nd rowother
3rd rowother
4th rowother
5th rowSVI

Common Values

ValueCountFrequency (%)
other 2201
58.4%
SVI 1034
27.4%
SVHC 386
 
10.2%
STMW 112
 
3.0%
SVHD 39
 
1.0%

Length

2023-03-07T11:16:41.473055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-07T11:16:41.582792image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
other 2201
58.4%
svi 1034
27.4%
svhc 386
 
10.2%
stmw 112
 
3.0%
svhd 39
 
1.0%

Most occurring characters

ValueCountFrequency (%)
o 2201
13.5%
t 2201
13.5%
h 2201
13.5%
e 2201
13.5%
r 2201
13.5%
S 1571
9.7%
V 1459
9.0%
I 1034
6.4%
H 425
 
2.6%
C 386
 
2.4%
Other values (4) 375
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11005
67.7%
Uppercase Letter 5250
32.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 1571
29.9%
V 1459
27.8%
I 1034
19.7%
H 425
 
8.1%
C 386
 
7.4%
T 112
 
2.1%
M 112
 
2.1%
W 112
 
2.1%
D 39
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
o 2201
20.0%
t 2201
20.0%
h 2201
20.0%
e 2201
20.0%
r 2201
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16255
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2201
13.5%
t 2201
13.5%
h 2201
13.5%
e 2201
13.5%
r 2201
13.5%
S 1571
9.7%
V 1459
9.0%
I 1034
6.4%
H 425
 
2.6%
C 386
 
2.4%
Other values (4) 375
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16255
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2201
13.5%
t 2201
13.5%
h 2201
13.5%
e 2201
13.5%
r 2201
13.5%
S 1571
9.7%
V 1459
9.0%
I 1034
6.4%
H 425
 
2.6%
C 386
 
2.4%
Other values (4) 375
 
2.3%

Class
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.6 KiB
negative
3481 
compensated_hypothyroid
 
194
primary_hypothyroid
 
95
secondary_hypothyroid
 
2

Length

Max length23
Median length8
Mean length9.0554083
Min length8

Characters and Unicode

Total characters34157
Distinct characters17
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownegative
2nd rownegative
3rd rownegative
4th rownegative
5th rownegative

Common Values

ValueCountFrequency (%)
negative 3481
92.3%
compensated_hypothyroid 194
 
5.1%
primary_hypothyroid 95
 
2.5%
secondary_hypothyroid 2
 
0.1%

Length

2023-03-07T11:16:41.688507image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-07T11:16:41.798212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
negative 3481
92.3%
compensated_hypothyroid 194
 
5.1%
primary_hypothyroid 95
 
2.5%
secondary_hypothyroid 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 7352
21.5%
t 3966
11.6%
i 3867
11.3%
a 3772
11.0%
n 3677
10.8%
g 3481
10.2%
v 3481
10.2%
o 778
 
2.3%
y 679
 
2.0%
h 582
 
1.7%
Other values (7) 2522
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 33866
99.1%
Connector Punctuation 291
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 7352
21.7%
t 3966
11.7%
i 3867
11.4%
a 3772
11.1%
n 3677
10.9%
g 3481
10.3%
v 3481
10.3%
o 778
 
2.3%
y 679
 
2.0%
h 582
 
1.7%
Other values (6) 2231
 
6.6%
Connector Punctuation
ValueCountFrequency (%)
_ 291
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33866
99.1%
Common 291
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 7352
21.7%
t 3966
11.7%
i 3867
11.4%
a 3772
11.1%
n 3677
10.9%
g 3481
10.3%
v 3481
10.3%
o 778
 
2.3%
y 679
 
2.0%
h 582
 
1.7%
Other values (6) 2231
 
6.6%
Common
ValueCountFrequency (%)
_ 291
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 7352
21.5%
t 3966
11.6%
i 3867
11.3%
a 3772
11.0%
n 3677
10.8%
g 3481
10.2%
v 3481
10.2%
o 778
 
2.3%
y 679
 
2.0%
h 582
 
1.7%
Other values (7) 2522
 
7.4%

Interactions

2023-03-07T11:16:25.763356image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:38.366261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:47.278682image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:59.752050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:06.918879image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:18.191927image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:27.186557image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:39.451619image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:49.030991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:00.637540image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:08.311154image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:18.996553image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:30.024061image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:41.982846image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:52.126710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:02.898491image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:11.377975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:21.242543image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:30.899690image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:42.404743image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:53.310672image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:03.109960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:12.250616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:21.494870image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:33.377815image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:44.506097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:56.095384image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:04.912939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:14.799075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:23.601165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:34.273481image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:45.062608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:15:57.154395image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:05.171248image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:15.751503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-03-07T11:16:23.908342image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2023-03-07T11:16:41.915895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-03-07T11:16:42.144299image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-07T11:16:42.233022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-07T11:16:42.320824image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-07T11:16:42.437474image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-03-07T11:16:42.669878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-07T11:16:37.064261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-07T11:16:37.638728image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-03-07T11:16:37.880099image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSH_measuredTSHT3_measuredT3TT4_measuredTT4T4U_measuredT4UFTI_measuredFTITBG_measuredTBGreferral_sourceClass
041Ffffffffffffffft1.3t2.5t125t1.14t109fNaNSVHCnegative
123Ffffffffffffffft4.1t2t102fNaNfNaNfNaNothernegative
246Mfffffffffffffft0.98fNaNt109t0.91t120fNaNothernegative
370Ftffffffffffffft0.16t1.9t175fNaNfNaNfNaNothernegative
470Ffffffffffffffft0.72t1.2t61t0.87t70fNaNSVInegative
518Ftffffffffffffft0.03fNaNt183t1.3t141fNaNothernegative
659FfffffffffffffffNaNfNaNt72t0.92t78fNaNothernegative
780Ffffffffffffffft2.2t0.6t80t0.7t115fNaNSVInegative
866Fffffffffffftfft0.6t2.2t123t0.93t132fNaNSVInegative
968Mfffffffffffffft2.4t1.6t83t0.89t93fNaNSVInegative
agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSH_measuredTSHT3_measuredT3TT4_measuredTT4T4U_measuredT4UFTI_measuredFTITBG_measuredTBGreferral_sourceClass
376261MfffffffffffffffNaNt1t72t0.7t103fNaNothernegative
376341FfffffffffffffffNaNfNaNfNaNfNaNfNaNfNaNothernegative
376454Mfffffffffffffft1.2t1.9t89t0.85t104fNaNSVInegative
376573Ftffffffffffffft8.5t2.1t104t1.13t92fNaNSVInegative
376619Ffffffffffffffft8.8t2.7t108t1.11t97fNaNothercompensated_hypothyroid
376730FffffffffffftfffNaNfNaNfNaNfNaNfNaNfNaNothernegative
376868Ffffffffffffffft1t2.1t124t1.08t114fNaNSVInegative
376974Ffffffffftffffft5.1t1.8t112t1.07t105fNaNothernegative
377072Mfffffffffffffft0.7t2t82t0.94t87fNaNSVInegative
377164Ffffffffffffffft1t2.2t99t1.07t92fNaNothernegative

Duplicate rows

Most frequently occurring

agesexon_thyroxinequery_on_thyroxineon_antithyroid_medicationsickpregnantthyroid_surgeryI131_treatmentquery_hypothyroidquery_hyperthyroidlithiumgoitretumorhypopituitarypsychTSH_measuredTSHT3_measuredT3TT4_measuredTT4T4U_measuredT4UFTI_measuredFTITBG_measuredreferral_sourceClass# duplicates
426FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative6
529FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
732FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
833FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
1541FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
1751FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
2157FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
2458FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative4
119FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative3
322FfffffffffffffffNaNfNaNfNaNfNaNfNaNfothernegative3